Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add internal definitions for S3 ls recursive #4467

Merged
merged 36 commits into from
Dec 22, 2023
Merged

Conversation

shaunrd0
Copy link
Contributor

@shaunrd0 shaunrd0 commented Oct 27, 2023

This adds internal definitions for sm::VFS::ls_recursive and S3::ls_filtered for S3 only. Results are collected using a S3Scanner class that manages fetching results from S3, and advances LsScanIterator until the next accepted result is reached. The S3Scanner::iterator() method returns an iterator for the scan which can be passed to STL constructors or algorithms using input iterators.


TYPE: FEATURE
DESC: Add internal definitions for S3 ls recursive

@shortcut-integration
Copy link

Copy link
Contributor

@eric-hughes-tiledb eric-hughes-tiledb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For as much feedback as I've left, I consider this PR in pretty good shape.

  • class VFSTest needs to be made more compliant with C.41. It can't entirely yet because it needs to deal with "not supported" compile environments
  • There's some other smallish polishing items.

The most significant thing is going to be to change the callback type to a template argument. This is done all over <algorithm>, for example. This will ease the next step of hooking this code up by wrapping the C callback that will come in through the API.

test/src/unit-vfs.cc Show resolved Hide resolved
test/src/unit-vfs.cc Outdated Show resolved Hide resolved
test/support/src/vfs_helpers.cc Outdated Show resolved Hide resolved
test/support/src/vfs_helpers.cc Outdated Show resolved Hide resolved
test/support/src/vfs_helpers.h Outdated Show resolved Hide resolved
tiledb/sm/filesystem/ls_callback.h Outdated Show resolved Hide resolved
tiledb/sm/filesystem/ls_callback.h Outdated Show resolved Hide resolved
tiledb/sm/filesystem/ls_callback.h Outdated Show resolved Hide resolved
tiledb/sm/filesystem/vfs.h Show resolved Hide resolved
tiledb/sm/filesystem/vfs.h Outdated Show resolved Hide resolved
@shaunrd0 shaunrd0 force-pushed the smr/sc-35767/vfs-ls-cb-s3 branch from 85ca26d to 03bddbf Compare November 3, 2023 18:27
@shaunrd0 shaunrd0 force-pushed the smr/sc-35767/vfs-ls-cb-s3 branch from 03bddbf to a9c3195 Compare November 3, 2023 20:18
Copy link
Contributor

@eric-hughes-tiledb eric-hughes-tiledb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We need to purge the word "callback" from most of this, along with its abbreviations. See specific comments.
  • I believe we should be able to have this PR not use #ifdef HAVE_S3 anywhere but in a few root places. It's too prevalent right now.

tiledb/sm/filesystem/ls_callback.h Outdated Show resolved Hide resolved
tiledb/sm/filesystem/ls_callback.h Outdated Show resolved Hide resolved
tiledb/sm/filesystem/ls_callback.h Outdated Show resolved Hide resolved
test/support/src/vfs_helpers.cc Outdated Show resolved Hide resolved
test/src/unit-vfs.cc Outdated Show resolved Hide resolved
test/src/unit-vfs.cc Outdated Show resolved Hide resolved
test/support/src/vfs_helpers.cc Show resolved Hide resolved
tiledb/sm/filesystem/s3.h Show resolved Hide resolved
@eric-hughes-tiledb
Copy link
Contributor

Previous review isn't complete. I pressed "send" too early. To be continued...

Copy link
Contributor

@eric-hughes-tiledb eric-hughes-tiledb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I think all the setup_test() functions should go. If they can't become the constructor body, then we need to identify the problems with initialization and solve them properly.

tiledb/sm/filesystem/ls_callback.h Outdated Show resolved Hide resolved
tiledb/sm/filesystem/ls_callback.h Outdated Show resolved Hide resolved
tiledb/sm/filesystem/ls_callback.h Outdated Show resolved Hide resolved
test/support/src/vfs_helpers.cc Outdated Show resolved Hide resolved
@eric-hughes-tiledb
Copy link
Contributor

I really hate the way github makes it so easy to send the review early.
Still reviewing.

Copy link
Contributor

@eric-hughes-tiledb eric-hughes-tiledb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a different interface for the function supplied as a selection predicate. The other issues are more about code structure and don't really affect this issue. Let's work on this one last.

test/support/src/vfs_helpers.cc Outdated Show resolved Hide resolved
tiledb/sm/filesystem/vfs.h Show resolved Hide resolved
tiledb/sm/filesystem/vfs.cc Outdated Show resolved Hide resolved
tiledb/sm/filesystem/s3.cc Outdated Show resolved Hide resolved
@eric-hughes-tiledb
Copy link
Contributor

Done for now.

@shaunrd0 shaunrd0 force-pushed the smr/sc-35767/vfs-ls-cb-s3 branch from 194b8e3 to 8da19b7 Compare November 22, 2023 22:24
@shaunrd0
Copy link
Contributor Author

I addressed the remaining feedback and updated unit tests today.. As a final step I ran the VCF query in the original story and the 3202 results are collected but the iteration never ends, so there's an edge case here that needs fixed and coverage added. 🧐

@shaunrd0
Copy link
Contributor Author

shaunrd0 commented Dec 7, 2023

Fixed the bug I ran into before the holidays and added test coverage. I created a story for follow up to inherit from FilesystemBase since it's pure virtual and would require implementing all the methods in this PR. I also noticed some of the methods in FilesystemBase don't exist for cloud backends and vice versa, so there will likely be more refactoring for FilesystemBase once it's being used for a cloud backend and not just local Posix. I added more detail to the story with some example code to show what I'm thinking there. It's not a ton of work but seems mostly out of scope for this PR.

https://app.shortcut.com/tiledb-inc/story/38117

I'm not sure where ls_recursive fits into FilesystemBase since it's a template method that can't be virtual, so open to suggestions there. Otherwise all feedback has been addressed and this is ready for review 🙂

Copy link
Contributor

@eric-hughes-tiledb eric-hughes-tiledb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • All the new tests are at the unit level. We don't want new unit tests in tiledb_unit (much of which is not unit tests).
    • They should appear in the test directory for the unit, which is sm/filesystem/test.
    • Test support code should go in the most local directory possible. In this case this is the same directory.
    • After these changes, we shouldn't need to touch the /test directory at all.
    • There are two class VFS. Probably don't need two.
  • See https://en.cppreference.com/w/cpp/named_req/InputIterator.
    • The iterator doesn't need begin() or end().
    • Need an internal function ensure_dereferenceable() to throw when ptr_ == nullptr.

test/support/src/vfs_helpers.cc Show resolved Hide resolved
tiledb/sm/filesystem/s3.h Outdated Show resolved Hide resolved
test/support/src/vfs_helpers.h Outdated Show resolved Hide resolved
tiledb/sm/filesystem/ls_scanner.h Show resolved Hide resolved
@@ -201,6 +201,7 @@ set(TILEDB_CORE_SOURCES
${TILEDB_CORE_INCLUDE_DIR}/tiledb/sm/filesystem/uri.cc
${TILEDB_CORE_INCLUDE_DIR}/tiledb/sm/filesystem/vfs.cc
${TILEDB_CORE_INCLUDE_DIR}/tiledb/sm/filesystem/vfs_file_handle.cc
${TILEDB_CORE_INCLUDE_DIR}/tiledb/sm/filesystem/ls_scanner.cc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of enlarging TILEDB_CORE_SOURCES, we should be able to add vfs to TILEDB_CORE_OBJECTS_LIBS instead. This will require removing some sources. If this doesn't just work without a hitch, we'll do it in a later PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing, I ran into a good bit of linkage errors here today so created SC-38601 for follow up.

I addressed all other feedback here but didn't quite make it to relocating tests. I created SC-38602 for relocating tests out of tiledb_unit, but if this PR is open after the holidays I'm happy to give it a shot here.

test/support/src/vfs_helpers.h Show resolved Hide resolved
tiledb/sm/filesystem/ls_scanner.h Outdated Show resolved Hide resolved
tiledb/sm/filesystem/ls_scanner.h Show resolved Hide resolved
tiledb/sm/filesystem/ls_scanner.h Show resolved Hide resolved
tiledb/sm/filesystem/ls_scanner.h Outdated Show resolved Hide resolved
@eric-hughes-tiledb
Copy link
Contributor

Damn github UI. Still reviewing.

Copy link
Contributor

@eric-hughes-tiledb eric-hughes-tiledb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the previous comments:

  • The end_ member variable is fundamentally broken. We can't use a pointer for the purpose that it's being put to. This is pretty specific to begin_ and end_. Their details need to be changed, but that's it. They're not being used in any conceptually incorrect way.

But for code organization (previously submitted), and this new issue, everything looks pretty good. Overall code structure is definitely in a good state.

tiledb/sm/filesystem/ls_scanner.h Show resolved Hide resolved
tiledb/sm/filesystem/ls_scanner.h Show resolved Hide resolved
tiledb/sm/filesystem/ls_scanner.h Show resolved Hide resolved
tiledb/sm/filesystem/ls_scanner.h Outdated Show resolved Hide resolved
tiledb/sm/filesystem/s3.h Show resolved Hide resolved
tiledb/sm/filesystem/s3.h Outdated Show resolved Hide resolved
tiledb/sm/filesystem/s3.h Show resolved Hide resolved
tiledb/sm/filesystem/s3.h Outdated Show resolved Hide resolved
tiledb/sm/filesystem/s3.h Show resolved Hide resolved
tiledb/sm/filesystem/s3.h Show resolved Hide resolved
@ihnorton
Copy link
Member

@eric-hughes-tiledb as long as the external API is finalized here, we need to wrap this up ASAP.

Copy link
Contributor

@eric-hughes-tiledb eric-hughes-tiledb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good enough for now.

We'll need to change class LsScanIterator to take an iterator type instead of an underlying data type when we generalize it past the current use only in S3. It works only because the iterator type declared (from vector) happens to match the one in the AWS library. That won't be true in general.

@@ -142,7 +169,9 @@ class LsScanIterator {
* @return Reference to this iterator after advancing to the next object.
*/
LsScanIterator& operator++() {
scanner_->next(ptr_);
if (++ptr_ != pointer()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a double increment, since next() is being called below. I'll let it pass for now since the unit tests pass, but this might be defective.

@KiterLuc KiterLuc merged commit b35ecdc into dev Dec 22, 2023
61 checks passed
@KiterLuc KiterLuc deleted the smr/sc-35767/vfs-ls-cb-s3 branch December 22, 2023 11:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants